Introduction to Pyseis.

Background

I was trying to edit some SU file headers in Seismic Unix. If you've ever used suchw you will know it is not the most intuitive module. So I started playing around with struct in order to pull the SU file into python, and kind of fell by accident into numpy's structured arrays. I never looked back.

If you're not familiar with structured arrays they allow you to define a numpy data type with various fields. Each field can have its own type - short, long, unsigned, signed, integer, float, etc.

Put together a data type with fields that match the seismic unix 240 byte header, add the trace data as another field and all of a sudden the SU file is the exact binary representation of the numpy array in memory (assuming you've got the endianness correct). Reading a seismic unix file becomes as simple as

dataset = numpy.fromfile('dataset.su', dtype=su)

and editting headers becomes as simple as

dataset['cdp'] = (dataset['sx'] + dataset['gx'])/2.0

Of course, this is just the tip of the iceberg. The seismic dataset is now a numpy array, and the full suite of python tools, including numpy, scipy, pandas etc are on tap. Consider the following example:

import numpy as np
data = np.fromfile('raw.su', type=su)
data['cdp'] = (data['sx'] + data['gx'])/2.0
cdps = np.sort(data, order['cdp', 'offset'])
fft = np.fft.rfft(cdps)
fft[0] = 0
cleaned = np.fft.irfft(fft)
cleaned.tofile('cleaned.su')

In 8 lines the SU file is imported, the header word CDP is calculated and set, the dataset is sorted into cdp/offset, has any DC bias removed in the frequency domain, and is written back to another SU file. The only thing in that code block that is not off-the-shelf python is the dtype definition, i.e. type=su.

Like I said, I never looked back.

Direction

I very quickly build up a collection of python code which could be used in conjunction with seismic unix. PySeis (PYthonic SEISmic processing) is an attempt to integrate that code into something more cohesive.

But as I started giving a short course in seismic processing it also became a teaching tool, and I have slowly been recoding the core modules to make it easier to teach. As such I have been making some tradeoffs, including:

readability over performance, unless the performance is painful.
heavily platform agnostic (I'm on linux, students are on windows)
minimal dependencies. The teaching lab computers do not generally have the lastest verison of python, numpy or scipy
stand alone. No seismic unix

Initially I was tying my dependencies to the EPD free distribution, but I have moved away from that when Canopy was released. But long term I will probably tie the dependencies to either EPD canopy or Continuum's Anaconda. Which one will probably depend upon the IT department at the University of Queensland.

Currently the goal is to include sufficient code to allow for a complete but relatively basic processing sequence, along with ipython notebooks for each process. The notebooks will hopefully become the primary teaching tool in future.